Modern biotechnologies often result in high-dimensional data sets with muchmore variables than observations (n $\ll$ p). These data sets pose newchallenges to statistical analysis: Variable selection becomes one of the mostimportant tasks in this setting. We assess the recently proposed flexibleframework for variable selection called stability selection. By the use ofresampling procedures, stability selection adds a finite sample error controlto high-dimensional variable selection procedures such as Lasso or boosting. Weconsider the combination of boosting and stability selection and presentresults from a detailed simulation study that provides insights into theusefulness of this combination. Limitations are discussed and guidance on thespecification and tuning of stability selection is given. The interpretation ofthe used error bounds is elaborated and insights for practical data analysisare given. The results will be used to detect differentially expressedphenotype measurements in patients with autism spectrum disorders. All methodsare implemented in the freely available R package stabs.
展开▼